Icon Plots. One of the potentially powerful general techniques of exploratory data analysis are multidimensional icon graphs. The basic idea of icon plots is to represent individual units of observation as particular graphical objects where values of variables are assigned to specific features or dimensions of the objects (usually one case = one object). The assignment is such that the overall appearance of the objects changes as a function of the configuration of values. Thus, the objects are given visual "identities" that are unique for configurations of values and that can be identified by the observer. Examining such icons may help to discover specific clusters of both simple relations and interactions between variables.

Icon Plots - Chernoff Faces. Chernoff faces is the most "elaborate" type of icon plot. A separate "face" icon is drawn for each case; relative values of the selected variables for each case are assigned to shapes and sizes of individual facial features (e.g., length of nose, angle of eyebrows, width of face).

Icon Plots - Columns. In this type of icon plot, an individual column graph is plotted for each case; relative values of the selected variables for each case are represented by the height of consecutive columns.

Icon Plots - Lines. In this type of icon plot, an individual line graph is plotted for each case; relative values of the selected variables for each case are represented by the height of consecutive break points of the line above the baseline.

Icon Plots - Pies. In this type of icon plot, data values for each case are plotted as a pie chart (clockwise, starting at 12:00); relative values of selected variables are represented by the size of the pie slices.

Icon Plots - Polygons. In this type of icon plot, a separate polygon icon is plotted for each case; relative values of the selected variables for each case are represented by the distance from the center of the icon to consecutive corners of the polygon (clockwise, starting at 12:00).

Icon Plots - Profiles. In this type of icon plot, an individual area graph is plotted for each case; relative values of the selected variables for each case are represented by the height of consecutive peaks of the profile above the baseline.

Icon Plots - Stars. In this type of icon plot, a separate star-like icon is plotted for each case; relative values of the selected variables for each case are represented (clockwise, starting at 12:00) by the relative length of individual rays in each star. The ends of the rays are connected by a line.

Icon Plots - Sun Rays. In this type of icon plot, a separate sun-like icon is plotted for each case; each ray represents one of the selected variables (clockwise, starting at 12:00), and the length of the ray represents 4 standard deviations. Data values of the variables for each case are connected by a line.

Incremental (vs. Non-Incremental Learning Algorithms). Methods (algorithms) for predictive data mining are also referred to as "learning" algorithms, because they derive information from the data to predict new observations. These algorithms can be divided into those that require one or perhaps two complete passes through the input data, and those that require iterative multiple access to the data to complete the estimation. The former type of algorithms are also sometimes referred to as incremental learning algorithms, because they will complete the computations necessary to fit the respective models by processing one case at a time, each time "refining" the solution; then, when all cases have been processed, only few additional computations are necessary to produce the final results. Non-incremental learning algorithms are those that need to process all observations in each iteration of an iterative procedure for refining a final solution. Obviously, incremental learning algorithms are usually much faster than non-incremental algorithms, and for extremely large data sets, non-incremental algorithms may not be applicable at all (without first sub-sampling).

Independent vs. Dependent Variables. The terms dependent and independent variable apply mostly to experimental research where some variables are manipulated, and in this sense they are "independent" from the initial reaction patterns, features, intentions, etc. of the subjects. Some other variables are expected to be "dependent" on the manipulation or experimental conditions. That is to say, they depend on "what the subject will do" in response. Independent variables are those that are manipulated whereas dependent variables are only measured or registered.

Somewhat contrary to the nature of this distinction, these terms are also used in studies where we do not literally manipulate independent variables, but only assign subjects to "experimental groups" based on some preexisting properties of the subjects. For example, if in an experiment, males are compared with females regarding their white cell count (WCC), Gender could be called the independent variable and WCC the dependent variable.

See Dependent vs. independent variables for more information.

Inertia. The term inertia in correspondence analysis is used by analogy with the definition in applied mathematics of "moment of inertia," which stands for the integral of mass times the squared distance to the centroid (e.g., Greenacre, 1984, p. 35). Inertia is defined as the total Pearson Chi-square for a two-way frequency table divided by the total sum of all observations in the table.

In-Place Database Processing (IDP). In-Place Database Processing (IDP) is an advanced database access technology developed at StatSoft to support high-performance, direct interface between external data sets residing on remote servers and the analytic functionality of data analysis software (such as STATISTICA products) residing on the client computers. The IDP technology has been developed to facilitate accessing data in large databases using a one-step process which does not necessitate creating local copies of the data set. IDP significantly increases the overall performance of data processing software; it is particularly well suited for large data mining and exploratory data analysis tasks.

The source of the IDP performance gains. The speed gains of the IDP technology - over accessing data in a traditional way - result not only from the fact that IDP allows the client data analysis software to access data directly in databases and skip the otherwise necessary step of first importing the data and creating a local data file, but also from its "multitasking" (technically, asynchronous and distributed processing) architecture. Specifically, IDP uses the processing resources (multiple CPUs) of the database server computers to execute the query operations, extract the requested records of data and send them to the client computer, while the data analysis software on the client computer is simultaneously processing these records as they arrive.

Interactions. An effect of interaction occurs when a relation between (at least) two variables is modified by (at least one) other variable. In other words, the strength or the sign (direction) of a relation between (at least) two variables is different depending on the value (level) of some other variable(s). (The term interaction was first used by Fisher, 1926). Note that the term "modified" in this context does not imply causality but represents a simple fact that depending on what subset of observations (regarding the "modifier" variable(s)) you are looking at, the relation between the other variables will be different.

For example, imagine that we have a sample of highly achievement-oriented students and another of achievement "avoiders." We now create two random halves in each sample, and give one half of each sample a challenging test, the other an easy test. We measure how hard the students work on the test. The means of this (fictitious) study are as follows:

Achievement-
oriented Achievement-
avoiders

Challenging Test
Easy Test 10
5 5
10

	Achievement- oriented	Achievement- avoiders
Challenging Test Easy Test	10 5	5 10

How can we summarize these results? Is it appropriate to conclude that (1) challenging tests make students work harder, (2) achievement-oriented students work harder than achievement-avoiders? None of these statements captures the essence of this clearly systematic pattern of means. The appropriate way to summarize the result would be to say that challenging tests make only achievement-oriented students work harder, while easy tests make only achievement-avoiders work harder. In other words, the relation between the type of test and effort is positive in one group but negative in the other group. Thus, the type of achievement orientation and test difficulty interact in their effect on effort; specifically, this is an example of a two-way interaction between achievement orientation and test difficulty. (Note that statements 1 and 2 above would describe so-called main effects.)

For more information regarding interactions, see Interaction Effects in the ANOVA chapter.

Interpolation. Projecting a curve between known data points to infer the value of a function at points between.

Interval Scale. This scale of measurement allows you to not only rank order the items that are measured, but also to quantify and compare the sizes of differences between them (no absolute zero is required).